Implement FP32 kleidiai Gemv #26302

JonathanC-ARM · 2025-10-14T15:45:44Z

Description

Implementation of special sgemm path which uses GEMV kernels in cases where M or N are 1

Additionally this pr introduces the usage of a microkernel interface which utilizes typedef's provided by KleidiAI such that we can simplify the code and remove things such as ternary operations for SME1 vs SME2 kernels

Indicative Performance

In Lieu of any production models where gemv was a large contributor of the network. I opted to create a mini model to test which contains thousands of randomized matmul variants. With a distribution of GEMV cases throughout

Using onnxruntime perf test I was able to half the total inference time vs mlas with this model

More Benchmarks to come shortly

Signed-off-by: Jonathan Clohessy <[email protected]>

hariharans29 · 2025-10-14T19:50:14Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-14T19:50:34Z

Azure Pipelines successfully started running 4 pipeline(s).

hariharans29 · 2025-10-16T17:10:08Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-16T17:10:27Z

Azure Pipelines successfully started running 4 pipeline(s).

hariharans29 · 2025-10-22T17:19:31Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-10-22T17:19:52Z

Azure Pipelines successfully started running 4 pipeline(s).

edgchen1 · 2025-10-23T15:31:45Z

onnxruntime/core/mlas/lib/kleidiai/sgemm_kleidiai.cpp

+kai_matmul_clamp_f32_f32p_f32p_ukernel sgemm_gemm = GetKleidiAISGemmUKernel();
+kai_matmul_clamp_f32_f32_f32p_ukernel sgemm_gemv = GetKleidiAISGemvUKernel();


GetKleidiAIXUKernel() returns const&. do we need to make a copy here?

Suggested change

kai_matmul_clamp_f32_f32p_f32p_ukernel sgemm_gemm = GetKleidiAISGemmUKernel();

kai_matmul_clamp_f32_f32_f32p_ukernel sgemm_gemv = GetKleidiAISGemvUKernel();

const kai_matmul_clamp_f32_f32p_f32p_ukernel& sgemm_gemm = GetKleidiAISGemmUKernel();

const kai_matmul_clamp_f32_f32_f32p_ukernel& sgemm_gemv = GetKleidiAISGemvUKernel();

updated to const in the latest push

edgchen1 · 2025-10-23T15:38:10Z

onnxruntime/core/mlas/lib/qgemm.cpp

    //No fallback and putting in guards
-    if(MLAS_CPUIDINFO::GetCPUIDInfo().HasArm_SME()){
-    ArmKleidiAI::MlasDynamicQGemmBatch(Shape, DataParams, BatchN, ThreadPool);
+    if(ArmKleidiAI::SMEInfo::CanUseSME2){


there are other places that need to be updated, like:

onnxruntime/onnxruntime/contrib_ops/cpu/quantization/dynamic_quantize_matmul.cc

Line 218 in b3ba580

if (!CPUIDInfo::GetCPUIDInfo().HasArm_SME()) {

onnxruntime/onnxruntime/test/mlas/unittest/test_dynamic_qgemm.cpp

Line 24 in b3ba580

if (!MLAS_CPUIDINFO::GetCPUIDInfo().HasArm_SME()) {

I might be missing some.

I think it would be worth making a helper function like MlasIsDynamicQGemmAvailable that has the appropriate checks and using that instead.

Added in the updated checks in various places like these in the latest push

Signed-off-by: Jonathan Clohessy <[email protected]>

Implement FP32 kleidiai Gemv

f201162

Signed-off-by: Jonathan Clohessy <[email protected]>

edgchen1 reviewed Oct 23, 2025

View reviewed changes

patryk-kaiser-ARM mentioned this pull request Oct 24, 2025

Fix: Disable KleidiAI on systems with SME1 but not SME2 #26399

Closed

JonathanC-ARM force-pushed the jclohess_kleidiai_gemv_implementation branch from 615a469 to a3f4f5b Compare October 24, 2025 15:43

JonathanC-ARM and others added 2 commits October 24, 2025 16:45

update kleidi version to fix missing header file

e0668fc

Signed-off-by: Jonathan Clohessy <[email protected]>

Update const for kernel interface and sme checks

e8ab1b1

Signed-off-by: Jonathan Clohessy <[email protected]>

JonathanC-ARM force-pushed the jclohess_kleidiai_gemv_implementation branch from a3f4f5b to e8ab1b1 Compare October 24, 2025 15:46

Merge branch 'microsoft:main' into jclohess_kleidiai_gemv_implementation

c9dab46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Implement FP32 kleidiai Gemv #26302

Implement FP32 kleidiai Gemv #26302

JonathanC-ARM commented Oct 14, 2025

Uh oh!

hariharans29 commented Oct 14, 2025

Uh oh!

azure-pipelines bot commented Oct 14, 2025

Uh oh!

hariharans29 commented Oct 16, 2025

Uh oh!

azure-pipelines bot commented Oct 16, 2025

Uh oh!

hariharans29 commented Oct 22, 2025

Uh oh!

azure-pipelines bot commented Oct 22, 2025

Uh oh!

edgchen1 Oct 23, 2025

Uh oh!

patryk-kaiser-ARM Oct 24, 2025

Uh oh!

JonathanC-ARM Oct 24, 2025

Uh oh!

edgchen1 Oct 23, 2025

Uh oh!

JonathanC-ARM Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		kai_matmul_clamp_f32_f32p_f32p_ukernel sgemm_gemm = GetKleidiAISGemmUKernel();
		kai_matmul_clamp_f32_f32_f32p_ukernel sgemm_gemv = GetKleidiAISGemvUKernel();

Uh oh!

Implement FP32 kleidiai Gemv #26302

Are you sure you want to change the base?

Implement FP32 kleidiai Gemv #26302

Conversation

JonathanC-ARM commented Oct 14, 2025

Description

Indicative Performance

Uh oh!

hariharans29 commented Oct 14, 2025

Uh oh!

azure-pipelines bot commented Oct 14, 2025

Uh oh!

hariharans29 commented Oct 16, 2025

Uh oh!

azure-pipelines bot commented Oct 16, 2025

Uh oh!

hariharans29 commented Oct 22, 2025

Uh oh!

azure-pipelines bot commented Oct 22, 2025

Uh oh!

edgchen1 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

patryk-kaiser-ARM Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

JonathanC-ARM Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

edgchen1 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

JonathanC-ARM Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants